Implement lazy loading for inline Arrow results #1029

jayantsing-db · 2025-09-30T08:55:33Z

Description

This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets.

Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths:

Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data.
Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage.

The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}).

This will reduce memory consumption and improve performance when dealing with large inline Arrow result sets similar to #975.

Testing

Unit tests
Integration tests
Manual testing

Additional Notes to the Reviewer

This PR introduces lazy loading support for inline Arrow results to improve memory efficiency when handling large result sets. Previously, InlineChunkProvider would eagerly fetch all arrow batches upfront when results had hasMoreRows = true, which could lead to memory issues with large datasets. This change splits the handling into two separate paths: 1. Lazy path (new): For Thrift-based inline Arrow results (when ARROW_BASED_SET is returned), we now use LazyThriftInlineArrowResult which fetches arrow batches on-demand as the client iterates through rows. This is similar to how LazyThriftResult works for columnar data. 2. Remote path (existing): For URL-based Arrow results (URL_BASED_SET), we continue using ArrowStreamResult with RemoteChunkProvider which downloads chunks from cloud storage. The InlineChunkProvider is now only used for SEA results with JSON_ARRAY format and INLINE disposition (contain all data inline {no hasMoreRows flag set}). This should reduce memory consumption and improve performance when dealing with large inline Arrow result sets.

jayantsing-db · 2025-09-30T08:57:35Z

I need to make some changes related to JDBC spec around row count because we don't have that data point when lazily fetching the results.

github-actions · 2025-10-31T06:41:28Z

This PR has been marked as Stale because it has been open for 30 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR.

jayantsing-db and others added 2 commits September 30, 2025 14:15

Merge branch 'main' into jayantsing-db/inline-arrow-lazy

a00b042

jayantsing-db requested a review from gopalldb September 30, 2025 08:57

github-actions bot added the Stale label Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement lazy loading for inline Arrow results #1029

Implement lazy loading for inline Arrow results #1029

Uh oh!

jayantsing-db commented Sep 30, 2025

Uh oh!

jayantsing-db commented Sep 30, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement lazy loading for inline Arrow results #1029

Are you sure you want to change the base?

Implement lazy loading for inline Arrow results #1029

Uh oh!

Conversation

jayantsing-db commented Sep 30, 2025

Description

Testing

Additional Notes to the Reviewer

Uh oh!

jayantsing-db commented Sep 30, 2025

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant